[KERNEL] Add e2e tests for writing tables with collated columns #5357

ilicmarkodb · 2025-10-17T01:18:17Z

Which Delta project/connector is this regarding?

Description

Added e2e tests for writing tables with collated columns. To enable writing collated data, the schema comparison had to be modified in some places.

How was this patch tested?

New tests.

Does this PR introduce any user-facing changes?

Yes, users can now create and write to Delta tables with collated types.

allisonport-db

Tests and approach LGTM, just had some concerns about how we compare throughout the code-base

allisonport-db · 2025-10-24T00:52:32Z

kernel/kernel-api/src/main/java/io/delta/kernel/types/DataType.java

    return equals(dataType);
  }

+  /**


Is it worth having this + the other equivalent method or might they satisfy the same goal?

I think equivalentIgnoreCollations is needed because of the cases like https://github.com/delta-io/delta/pull/5357/files#diff-c9fd0f3e881617ea0f2439a29c32a35e8c32fbcdda229105be76ece4001819acR200. Here we don't to ignore names and metadata.

allisonport-db · 2025-10-24T00:56:06Z

kernel/kernel-api/src/main/java/io/delta/kernel/Transaction.java


          ColumnarBatch data = filteredBatch.getData();
-          if (!data.getSchema().equals(tableSchema)) {
+          if (!data.getSchema().equivalentIgnoreCollations(tableSchema)) {


I left a similar comment on the other PR, but how do we know when/where we should use this instead? How can we be sure all the instances of comparison we do throughout the code-base won't now be an issue? concerned there might be somewhere that would fail only if a test had collations in the schema, which the majority of existing tests won't

It seems like we need to audit everywhere we might compare data types or schemas?

Example: https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/statistics/DataFileStatistics.java#L383 is this one okay?

I'm worried we do this type of comparison all over..

ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch 5 times, most recently from 53c0334 to 9e24ec3 Compare October 22, 2025 22:14

temp

f5735af

ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch 2 times, most recently from c05daf9 to 13e75c4 Compare October 23, 2025 17:02

fix

9472b72

ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch 11 times, most recently from f8839d0 to e95e836 Compare October 23, 2025 20:21

temp

bc298a1

ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch from e95e836 to bc298a1 Compare October 23, 2025 20:42

allisonport-db reviewed Oct 24, 2025

View reviewed changes

ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch from c165c05 to bb11dc1 Compare October 24, 2025 13:57

partition tests added

114903b

ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch from bb11dc1 to 114903b Compare October 24, 2025 13:58

ilicmarkodb added 2 commits October 24, 2025 16:25

style fix

98c35e1

implement equivalent for StringType

39e2708

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[KERNEL] Add e2e tests for writing tables with collated columns #5357

[KERNEL] Add e2e tests for writing tables with collated columns #5357

Uh oh!

ilicmarkodb commented Oct 17, 2025 •

edited

Loading

Uh oh!

allisonport-db left a comment

Uh oh!

allisonport-db Oct 24, 2025

Uh oh!

ilicmarkodb Oct 24, 2025

Uh oh!

allisonport-db Oct 24, 2025

Uh oh!

allisonport-db Oct 24, 2025

Uh oh!

allisonport-db Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[KERNEL] Add e2e tests for writing tables with collated columns #5357

Are you sure you want to change the base?

[KERNEL] Add e2e tests for writing tables with collated columns #5357

Uh oh!

Conversation

ilicmarkodb commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Uh oh!

allisonport-db left a comment

Choose a reason for hiding this comment

Uh oh!

allisonport-db Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

ilicmarkodb Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

allisonport-db Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

allisonport-db Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

allisonport-db Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilicmarkodb commented Oct 17, 2025 •

edited

Loading